Skip to content

Add SkillsBench task source preflight guard#809

Merged
huangruiteng merged 1 commit into
mainfrom
codex/skillsbench-task-source-preflight
Jun 28, 2026
Merged

Add SkillsBench task source preflight guard#809
huangruiteng merged 1 commit into
mainfrom
codex/skillsbench-task-source-preflight

Conversation

@huangruiteng

Copy link
Copy Markdown
Owner

Summary

  • fail fast when a requested SkillsBench task is absent from the canonical tasks source before launching a full runner arm
  • carry public-safe task-source preflight evidence through compact run output and ledger entries
  • classify task-source selection as a repairable benchmark setup issue and cover it with a focused smoke

Validation

  • python3 -m py_compile scripts/skillsbench_automation_loop.py loopx/benchmark_adapters/skillsbench.py loopx/benchmark_ledger.py loopx/status.py examples/skillsbench-task-source-preflight-smoke.py
  • python3 examples/skillsbench-task-source-preflight-smoke.py
  • git diff --check
  • loopx check --scan-path scripts/skillsbench_automation_loop.py --scan-path loopx/benchmark_adapters/skillsbench.py --scan-path loopx/benchmark_ledger.py --scan-path loopx/status.py --scan-path examples/skillsbench-task-source-preflight-smoke.py

@huangruiteng

Copy link
Copy Markdown
Owner Author

Self-review summary:

  • Scope: benchmark helper/runtime guard only; no scoring, task semantics, leaderboard, permission, or job-launch changes.
  • Behavior: fail fast when a requested SkillsBench task is absent from canonical tasks before spending a full runner arm; preserve only public-safe task-source preflight evidence in compact output and ledger.
  • Validation passed locally:
    • python3 -m py_compile scripts/skillsbench_automation_loop.py loopx/benchmark_adapters/skillsbench.py loopx/benchmark_ledger.py loopx/status.py examples/skillsbench-task-source-preflight-smoke.py
    • python3 examples/skillsbench-task-source-preflight-smoke.py
    • git diff --check
    • loopx check on the five changed paths
  • Merge status: normal merge is blocked by branch policy; auto-merge is disabled for this repository.

@huangruiteng

Copy link
Copy Markdown
Owner Author

Findings: none from this review pass.

Open questions / assumptions: I treated this as a benchmark helper/runtime preflight guard. It fails fast before runner spend when a requested SkillsBench id is present only in the sanity source and absent from canonical tasks/; I did not see scoring, task semantics, leaderboard, submission, or raw-evidence boundary changes.

Validation performed:

  • python3 examples/skillsbench-task-source-preflight-smoke.py passed on the PR head.

Merge decision: hold for normal reviewer/branch-protection flow; no blocker found in the diff plus focused preflight smoke.

@huangruiteng huangruiteng merged commit 05042b3 into main Jun 28, 2026
@huangruiteng huangruiteng deleted the codex/skillsbench-task-source-preflight branch June 28, 2026 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant